Search CORE

4 research outputs found

Boosting ensembles with controlled emphasis intensity

Author: Ahachad Anas
Figueiras Aníbal
Álvarez Pérez Lorena
Publication venue: 'Elsevier BV'
Publication date: 01/01/2017
Field of study

Boosting ensembles have deserved much attention because their high performance. But they are also sensitive to adverse conditions, such as noisy environments or the presence of outliers. A way to fight against their degradation is to modify the forms of the emphasis weighting which is applied to train each new learner. In this paper, we propose to use a general form for that emphasis function, which not only includes an error dependent and a proximity to the classification boundary dependent term, but also a constant value which serves to control how much emphasis is applied. Two convex combinations are used to consider these terms, and this makes possible to control their relative influence. Experimental results support the effectiveness of this general form of boosting emphasis.This work has been partly supported by research grants CASI-CAM-CM (S2013/ICE-2845,DGUI-CM), and Macro-ADOBE ( TEC2015- 67719-P, MINECO )

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Universidad Carlos III de Madrid e-Archivo

Pre-emphasizing Binarized Ensembles to Improve Classification Performance

Author: Ahachad Anas
Figueiras Aníbal
Álvarez Pérez Lorena
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/2017
Field of study

14th International Work-Conference on Artificial Neural Networks, IWANN 2017Machine ensembles are learning architectures that offer high expressive capacities and, consequently, remarkable performances. This is due to their high number of trainable parameters.In this paper, we explore and discuss whether binarization techniques are effective to improve standard diversification methods and if a simple additional trick, consisting in weighting the training examples, allows to obtain better results. Experimental results, for three selected classification problems, show that binarization permits that standard direct diversification methods (bagging, in particular) achieve better results, obtaining even more significant performance improvements when pre-emphasizing the training samples. Some research avenues that this finding opens are mentioned in the conclusions.This work has been partly supported by research grants CASI-CAM-CM (S2013/ICE-2845, DGUI-CM and FEDER) and Macro-ADOBE (TEC2015-67719-P, MINECO)

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Universidad Carlos III de Madrid e-Archivo

Enfatizado y diversificación en clasificación máquina

Author: Ahachad Anas
Publication venue
Publication date: 06/11/2017
Field of study

Las excepcionales capacidades de los métodos de Boosting, especialmente del algoritmo Real AdaBoost (RAB), para resolver problemas de decisióon y clasificación son universalmente conocidas. Estas buenas prestaciones provienen de la construcción progresiva de un conjunto de aprendices débiles e inestables, combinados de forma lineal, que prestan más atención a las muestras de más difícil clasificación. Sin embargo, el correspondiente énfasis que se aplica puede ser inadecuado, en particular, en casos de elevados niveles de ruido o abundante presencia de muestras fuera de margen (“outliers"). Para estos escenarios de trabajo, se han propuesto varias modificaciones del algoritmo de Boosting básico para controlar la cantidad de énfasis que se aplica, pero ninguna de estas modificaciones parece ofrecer los resultados esperados cuando se trabaja con conjuntos de datos desequilibrados, en presencia de outliers o con distribuciones de datos asimétricas. Con esto en mente, en primer lugar, se propone en el Capítulo 2 una modificación sencilla de la función de énfasis del algoritmo RAB estándar, que no solo tiene en cuenta el error de la muestra a clasificar, sino también los errores de clasificación de las muestras más próximas a ella. A continuación, se presenta en el Capítulo 3 una generalización de la función de énfasis híbrido utilizada en versiones del algoritmo RAB que ponderan (a través de un parámetro de mezcla) las muestras según su error de clasificación y proximidad a la frontera. Esta nueva función de énfasis incluye un término constante que sirve para moderar la intensidad de énfasis, o en otras palabras, limitar la atención centrada en las muestras más próximas a la frontera o más difíciles de clasificar. Los resultados obtenidos en el Capítulo 2 y Capítulo 3 indican que estas modificaciones de las funciones de énfasis permiten alcanzar mejores prestaciones. Posteriormente, en el Capítulo 4 se propone enfatizar los costes asociados a las muestras de entrenamiento para mejorar los resultados de clasificación de conjuntos basados en esquemas de diversificación estándar y binarización. Los resultados obtenidos en este capítulo muestran cómo las técnicas de binarización permiten que métodos de diversificación estándar (Bagging, concretamente) consigan alcanzar mejores prestaciones; y se obtienen mejoras mucho más significativas cuando previamente se enfatizan las muestras de entrenamiento. Esta Tesis Doctoral concluye enumerando las principales contribuciones de la misma y con una sugerencia de líneas de investigación abiertas.The exceptional capabilities of Boosting methods, in particular of Real Adaboost (RAB) ensembles, for solving decision and classification problems are universally recognized. These capabilities come from progressively constructing unstable and weak learners that pay more attention to samples that oppose more difficulties to be correctly classified, and linearly combine them in a progressive manner. However, the corresponding emphasis can be inappropriate, in particular, when there is an intensive noise or in the presence of outliers. In this scenario, there are many proposed modifications to control the emphasis but they show limited success for imbalanced or asymmetric problems. A simple way to deal with these situations is to modify the forms of the emphasis weighting which is applied to train each new learner. Firstly, we propose in Chapter 2 a simple modification of the well-known RAB emphasis function. The basic idea underlying this modification, which makes use of the neighborhood concept to reduce the above drawbacks, is to emphasize the samples according to their errors and those of their neighbors. Next, in Chapter 3, we propose a general form of the emphasis, which not only includes an error dependent and a proximity to the classification dependent terms, but also a constant value which serves to graduate the intensity of that mixed emphasis, limiting the increased attention which is paid to highly erroneous samples and those samples that are near to the boundary. Experimental results obtained in both Chapter 2 and Chapter 3 support the effectiveness of these forms of Boosting emphasis. In Chapter 4, it is proposed to weight the costs associated to training examples, in order to improve the classification results of ensembles based on standard diversification and binarization techniques. Experimental results show that binarization permits that standard direct diversification methods (Bagging, in particular) achieve better results, and even more significant performance improvements are obtained when pre-emphasizing the training samples. This Doctoral Thesis concludes enumerating its main contributions and with some suggestions of new research lines arising from this work.Programa Oficial de Doctorado en Multimedia y ComunicacionesPresidente: Luis Vergara Domínguez.- Secretario: Francisco Javier González Serrano.- Vocal: Alberto Suárez Gonzále

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Universidad Carlos III de Madrid e-Archivo

Word Sense Induction in the Arabic Language: A Self-Term Expansion Based Approach

Author: Anas Ahachad
David Pinto
Héctor Jiménez-salazar
Paolo Rosso
Yassine Benajiba
Publication venue
Publication date
Field of study

Abstract. The aim of the word sense induction/discrimination task of natural language processing is to discover the sense associated to each instance of a given ambiguous word. In this paper we present an approach based on clustering of a self-expanded version of the original dataset in order to tackle this particular problem. The self-expansion technique substitutes every term of the original corpus with a set of co-related terms which is calculated by means of pointwise mutual information. Our proposal which was tested for the English language shows a good performance for the Arabic language too, highlighting its languageindependent characteristic

CiteSeerX